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DETAILED ACTION 

In response to Applicant's remarks filed 5/21/2008, claim 16 is cancelled. Claims 1-15 & 17-45 
are pending. 

Continued Examination Under 37 CFR 1.114 

1 . A request for continued examination under 37 CFR 1.114, including the fee set forth in 
37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible 
for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has 
been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 
CFR 1.114. Applicant's submission filed on 5/21/2008 has been entered. 

Claim Rejections - 35 USC § 103 

2. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

3. The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 
(1966), that are applied for establishing a background for determining obviousness under 35 
U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating obviousness 
or nonobviousness. 
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4. Claims 1,2, 17, 18, 20, 23-25, 28, 29, 40, & 41 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Stelovsky (US 5,782,692), hereinafter known as Stelovsky, in view of 
Wang, (US 2002/0133764 A1), hereinafter known as Wang, and further in view of Hansen et al. 
(US 2002/0038456 A1), hereinafter known as Hansen. 

5. Stelovsky teaches a processor-readable medium comprising executable instructions for 
personalizing karaoke (Column 1, Lines 54-67), comprising: segmenting visual content to 
produce a plurality of sub-shots, where the instructions for segmenting visual content segment 
video, and segmenting music to produce a plurality of music sub-clips (multimedia presentation 
track consisting of video, audio, and text display is segmented with respect to specific beginning 
and ending points, Column 3, Lines 27-65); selecting important sub-shots from within the 
plurality of sub-shots (Column 3, Lines 52-60; it is understood that the selected sub-shots are 
important to the user); and displaying at least some of the plurality of sub-shots as a 
background to lyrics associated with the plurality of music sub-clips ("Karaoke Game" 
presentation has synchronized video and instrumental sound tracks, Column 9, Lines 15-21; the 
text can be superimposed on the video, Column 10, Lines 5-6). [Claim 1]. 

6. Stelovsky teaches a processor-readable medium comprising instructions for providing 
lyrics for integrating lyrics, music, and video content suitable for karaoke, comprising 
instructions for: receiving a request for a file associated with a specific song (clicking on a word 
in the text track, Column 14, Lines 42-48), wherein the file comprises music, lyrics, and timing 
values (The time-dependent sequence is composed of tracks that are synchronized with respect 
to a common time axis {hereinafter "multimedia presentation"}. The basic track consists of video 
display images and is synchronized with at least one other track that consists of audio or text 
display, 3:31-35; The multimedia presentation is segmented with respect to specific beginning 
and ending points of segments on the time axis, i.e. there are one or more points of time that 
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partition the time axis into time segments, 3:52-55), and fulfilling the request by sending the file 
associated with the specified song (connection is established with a remote on-line service, 
search query initiated, and results are displayed, Column 14, Lines 42-48), segmenting visual 
content to produce a plurality of sub-shots of a length corresponding to the music sub-clips 
(multimedia presentation track consisting of video, audio, and text display is segmented with 
respect to specific beginning and ending points, Column 3, Lines 27-65), and outputting the 
plurality of music sub-clips together with corresponding sub-shots of visual content, which is 
configured as a background to the lyrics associated with the music sub-clips ("Karaoke Game" 
presentation has synchronized video and instrumental sound tracks, Column 9, Lines 15-21; the 
text can be superimposed on the video, Column 10, Lines 5-6) [Claim 23]. 
7. Stelovsky teaches a personalized karaoke device, comprising: a music analyzer 
configured to create music sub-clips of varying lengths according to a song (Segmentation 
Authoring System {SAS} facilitates the identification of points in time where a segment starts 
and ends, Column 5, Line 62 to Column 6, Line 2; multimedia presentation track consisting of 
video, audio, and text display is segmented with respect to specific beginning and ending points, 
Column 3, Lines 27-65); a visual content analyzer configured to define and select visual content 
sub-shots (Using SAS, the author partitions the multimedia presentation into time segments 
according to predominant time units, e.g., measures of song, sound bites, or action sequences 
in a movie, Column 6, Lines 51-54); a lyric formatter configured to time delivery of syllables of 
lyrics of the song (evaluation feedback of user's input includes visualization of differences in 
pronunciation patterns, processes involved in generating {human} speech, such as positions of 
the tongue and airflow patterns, Column 14, Lines 52-59; it is inherent that the speech analysis 
as disclosed could recognize syllables and sentences, which are pronunciation patterns); 
sections of the text track are linked to the time segments, Column 6, Line 55); and a composer 
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configured to assemble the music sub-clips with the visual content sub-shots, and configured to 
adjust the length of the sub-shots to correspond to the music sub-clips, and to superimpose the 
syllables of the lyrics of the song over the sub-shots ({SAS} sections of a text track and 
additional media resources are linked to the time segments, Column 6, Lines 55-57) [Claim 25]. 
8. Stelovsky teaches an apparatus, comprising: means for creating music sub-clips 
according to a song, and means for defining and selecting visual content sub-shots (multimedia 
presentation track consisting of video, audio, and text display is segmented with respect to 
specific beginning and ending points, Column 3, Lines 27-65); means for timing delivery of 
syllables of lyrics of the song (sections of the text track are linked to the time segments, Column 
6, Line 55; the text can be superimposed on the video, Column 10, Lines 5-6, also Column 14, 
Lines 52-59 and Column 9, Lines 1 5-21 ); and means for assembling the music sub-clips with 
the visual content sub-shots and adjusting the length of the sub-shots to correspond to the 
length of the music sub-clips (the music video is synchronized with a song's audio as well as the 
song's lyrics, and partitioned into time segments that correspond to the song's phrases, Column 

8, Lines 34-45), and to superimpose the syllables of the lyrics of the song over the sub-shots 
(While the song is playing, the corresponding phrases are highlighted in the lyrics field. If 
necessary, the lyric's field is automatically scrolled to reveal the current phrase, Column 8, Lines 
34-45) [Claim 40]. 

9. What Stelovsky fails to teach is where the segmenting of music to produce a plurality of 
music sub-clips establishes boundaries between the music sub-clips at beat positions within the 
music [Claims 1, 23, 25, & 40], and wherein each sub-clip has a duration that is a function of 
song tempo [Claim 28]. However, Wang teaches a method of detecting beats in a music stream 
(Beat is defined in the relevant art as a series of perceived pulses dividing a musical signal into 
intervals of approximately the same duration. Beat detection can be accomplished by any of 
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three methods. The preferred method uses the variance of the music signal, which variance is 
derived from decoded Inverse Modified Discrete Cosine Transformation (IMDCT) coefficients. 
The variance method detects primarily strong beats. The second method uses an Envelope 
scheme to detect both strong beats and offbeats. The third method uses a window-switching 
pattern to identify the beats present. The window-switching method detects both strong and 
weaker beats. In one embodiment, a beat pattern is detected by the variance and the window 
switching methods. The two results are compared to more conclusively identify the strong beats 
and the offbeats, Para. 0070-0074; see also Figure 7, the numbered delta functions are 
understood to be detected beats), and segmenting the music stream at beat boundaries (A 
normal, error-free audio transmission is represented in the top graph {of Figure 6} by a first and 
second beat-to-beat interval waveform. The first waveform includes a first beat and the audio 
information up to a second beat. Similarly, the second waveform includes the second beat and 
the audio information up to a third beat; In accordance with the method of the present invention, 
a replacement waveform, including a replacement beat, is copied from the first beat and the first 
waveform; and is substituted for the missing audio segment in the time interval to t 2 , as 
shown in the bottom graph; all at Para. 0058-0069; see also Figure 6). The beat intervals are 
taught by Wang to be a function of song tempo (the beat-to-beat interval is replaced by the 
audio data frames from a corresponding beat-to-beat interval in a preceding 4/4 bar. Most 
popular music has a rhythm period in 4/4 time, Para. 0067; 4/4 time is understood to be a 
tempo). Any of the three methods taught by Wang would be used to detect beats in a music clip, 
and Wang's method of copying and pasting music waveforms segmented by at beat positions 
would be used to align video, still pictures, music, and lyrics along those boundaries, in the 
manner as taught by Stelovsky. Therefore, it would have been obvious to one of ordinary skill in 
the art, at the time the invention was made, to have used Wang's methods of segmenting of 
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music to produce a plurality of music sub-clips, establishing boundaries between the music sub- 
clips at beat positions within the music, with the methods of Stelovsky for integrating lyrics, 
music, and video content suitable for karaoke, in order to exploit the beat pattern of music 
signals to improve the presentation of music when transferred over a network [Claims 1 , 23, 25, 
28, & 40]. 

10. What Stelovsky and Wang fail to teach is selecting sub-shots such that they are 
uniformly distributed within the video [Claims 1, 23, 25 & 40]. However, Hansen teaches a 
system and method for automatically producing media content by creating video subclips called 
"microchannels" by a "microchannel creator" that determines the desired channel content based 
upon uniform distribution of video, video and audio, still images and mosaics of different 
locations (The channel creator 210 then accesses the individual clips from the database and 
creates the continuous stream or "microchannel." The continuous stream is defined by a 
concatenated stream of output, whether it be a series of images, video and audio, or other 
forms of media; The microchannel creator makes the following decisions when creating a 
microchannel: (i) what type of media should be sent at a given time (video, audio, image); (ii) 
what triggers should be given priority, assuming multiple triggers are defined for the 
microchannel; (iii) when advertising should be inserted into the video stream, and what 
advertising should be provided; and (iv) when the database should be accessed for pre- 
recorded clips that are not currently posted to the microchannel as new clips. The channel 
creator runs via decision algorithms that are determined by the desired channel content for the 
microchannel. This is best illustrated by example. Considering a hypothetical travel-related site, 
the following type of microchannel might be desired: (i) commercials should be presented once 
per minute in ten second maximum durations; (ii) uniform distribution of video, video and audio, 
still images and mosaics of different locations; (iii) emphasis on video content using activity 
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triggers on beach cams and urban cams; (iv) emphasis on mosaic content using periodic 
triggering without motion for panoramic cameras; (v) emphasis on still image content for interior 
cameras, such as restaurant cameras; (vi) live, real-time clips during daylight hours; and (vii) 
pre-recorded clips during night hours when beach activity has ceased, Para. 0085-88). As best 
understood, Hansen teaches selecting "microchannels" uniformly from a source. The 
"microchannel creator" of Hanson would be used in the device of Stelovsky to uniformly select 
video and photographic content. Therefore, it would have been obvious to one of ordinary skill in 
the art, at the time the invention was made, to selecting sub-shots such that they are uniformly 
distributed within the video, as taught by Hansen, in the device of Stelovsky, in light of Wang, in 
order to automatically produce and distribute media content to a targeted audience, for 
providing more interesting and representative content [Claims 1, 23, 25 & 40]. 

1 1 . Stelovsky teaches instructions for shortening some of the plurality of sub-shots to a 
length of a corresponding music sub-clip (the system displays the current segment's start and 
end points, so the author can select and edit the boundary points, Column 7, Lines 14-19) 
[Claim 2]. 

12. What Stelovsky, Wang, and Hansen further fail to explicitly teach is wherein the 
segmenting music comprises instructions for bounding the sub-clip's length according to: 
minimum length = min(max(2*tempo,2),4) and maximum length = minimum length+2 [Claim 17], 
or establishing the music sub-clip's length within a range of 3 to 5 seconds [Claim 18]. However, 
Applicant has not disclosed that having (min(max(2*tempo,2),4) < length < 
min(max(2*tempo,2),4)+2) or (3 < length < 5) seconds solves any stated problem or is for any 
particular purpose. Moreover, it appears that the arbitrary length of the sub-clips of Stelovsky or 
the Applicant's instant invention would perform equally well for synchronizing the sub-clips with 
a video. Accordingly, it would have been obvious to one of ordinary skill in the art, at the time 
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the invention was made, to have modified Stelovsky such that the music sub-clips had a rigid 
minimum and maximum length, in light of Wang and Hansen, because such a modification 
would have been considered a mere design consideration, which fails to patentably distinguish 
over Stelovsky [Claims 17 & 18]. 

13. Stelovsky teaches instructions for obtaining lyrics from a file (textual track can be 
generated remotely and transmitted using communications means, Column 14, Lines 20-24); 
and coordinating delivery of the lyrics with the music using timing information contained within 
the file (Column 3, Lines 52-65) [Claim 20]. 

14. Stelovsky teaches wherein obtaining lyrics comprises instructions for sending the file 
over a network to a karaoke device (textual track can be generated remotely and transmitted 
using communications means, Column 14, Lines 20-24; on-line services provide downloading of 
files, e.g. Internet, Column 6, Lines 49-50) [Claim 24]. 

15. Stelovsky teaches wherein the visual content analyzer is configured to segment video 
into sub-shots (Column 6, Lines 51-54) [Claim 29]. 

16. Stelovsky teaches wherein the means for defining and selecting visual content sub-shots 
is a video analyzer configured to segment video into sub-shots (Using SAS, the author partitions 
the multimedia presentation into time segments according to predominant time units, e.g., action 
sequences in a movie, Column 6, Lines 51-54) [Claim 41]. 

17. Claims 3 & 8 are rejected under 35 U.S.C. 103(a) as being unpatentable over Stelovsky, 
in view of Wang and Hansen, and further in view of Golin (US 5,990,980), hereinafter known as 
Golin. 

18. Stelovsky, Wang, and Hansen teach all the features as described above in the rejection 
of claim 1. What Stelovsky, Wang, and Hansen fail to teach is wherein segmenting the visual 
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content comprises instructions for: dividing a shot into two sub-shots at a maximum peak of a 
frame difference curve; and repeating the dividing to result in sub-shots shorter than a maximum 
sub-shot length [Claim 3]. However, Golin teaches the use of a Frame Dissimilarity Measure 
(FDM), which is the ratio of a net dissimilarity measure and a cumulative dissimilarity measure 
of two consecutive frames (Column 3, Line 65 to Column 4, Line 12). The processing of sub- 
shots uses the FDM to identify transitions between shots in a video sequence, which appear as 
peaks in the FDM data (Column 5, Lines 21-42). The data analysis for the sub-shot dividing is a 
loop, which starts with frames at the beginning of the video sequence and scans through the 
data to the frames at the end of the sequence (Column 5, Lines 54-62). The length of the entire 
video sequence is a maximum sub-shot length. Therefore, it would have been obvious to one of 
ordinary skill in the art, at the time the invention was made, to have used the FDM peak analysis 
of dividing sub-shots, as described in Golin, for the video segmenting used in Stelovsky, in light 
of Wang and Hansen, in order to more effectively detect gradual transitions between subshots 
[Claim 3]. 

19. Stelovsky teaches where a sub-shot comprises a video of at least a predetermined 
length based on the length of a music sub-clip (The recording creates a new "user's voice" 
sound track. As the beginning of this track is well known, the track is synchronized with the 
other tracks of the presentation. As a consequence, the "user's voice" sound track is partitioned 
into the same time segments as the other tracks, Column 9, Lines 31-37). What Stelovsky and 
Wang further fail to teach is wherein each sub-shot comprises a segment of video of at least a 
predetermined length based on the length of the music sub-clips and segmented based on a 
magnitude of difference between adjacent frames [Claim 8]. However, Hansen teaches a 
system and method for automatically producing media content, in which the clip has a 
predetermined minimum length {one still frame}, based on detected trigger events in the clip (A 
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"clip" may be defined as a duration of time when the triggers that are set for the capture system 
are activated-such as when there is motion in the scene and the trigger is set to a basic motion 
cue. The clip preferably ends when the trigger event is no longer detected or when a certain 
time period expires, although other more sophisticated methods for trigger intervals may also be 
utilized. Once a clip is delineated, the content is generated. At a minimum, the content includes 
one still image that represents the trigger event in action. For example, 15 seconds out of one 
minute of captured content may be identified as qualifying content, Para. 0043). What 
Stelovsky, Wang, and Hansen fail to explicitly teach is where the trigger events are based on 
the length of music sub-clips and segmented based on a magnitude of frame difference [Claim 
8]. However, Golin teaches the use of an FDM to segment video (Column 3, Line 65 to Column 
4, Line 12; Column 5, Lines 21-42 and Lines 54-62). The FDM of Golin is the magnitude of 
dissimilarity between two consecutive frames of a video, as demonstrated above. This FDM 
would be used as a trigger event, as described in Hansen, when used to determine the length of 
a video sub shot in the system and method of time-segmenting taught by Stelovsky. Therefore, 
it would have been obvious to one of ordinary skill in the art, at the time the invention was made, 
to have used the frame dissimilarity measure of Golin to determine a sub-shot length in the 
system and method of Golin, in light of the teachings of Wang and Hansen, in order to 
synchronize audio tracks with gradual transitions between shots in a video, in order to parse a 
video for segmentation that does not have abrupt shot transitions [Claim 8]. 

20. Claims 4-7, 10, 32, & 33 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Stelovsky, in view of Osberger (US 6,670,963), hereinafter known as Osberger. 

21 . Stelovsky, Wang, and Hansen teach all the features as described above in the rejection 
of claims 1 & 25. What Stelovsky, Wang, and Hansen fail to teach is wherein the filtering of a 
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plurality of sub-shots is according to importance or quality [Claim 4]. However, Osberger 
teaches giving areas of medium motion high importance (Column 7, Lines 10-21). Osberger 
also teaches that areas of low texture (quality) such as faces are strong attractors of attention 
(Column 8, Lines 40-54). The sub-shots that are high in "regions of interest", or attention 
attracting, are identified (filtered) as taught by Osberger (Column 2, Lines 24-41). Therefore, it 
would have been obvious to one of ordinary skill in the art, at the time the invention was made, 
to have used the methods of Osburger for filtering sub-shots based on attention indices such as 
importance to the camera and texture quality, in the karaoke video segmenting device of 
Stelovsky, in order to increase the entertainment value of the karaoke experience to a user 
[Claim 4]. 

22. What Stelovsky, Wang, and Hansen further fail to teach is wherein filtering the plurality 
of sub-shots according to importance comprises instructions for evaluating frames within a sub- 
shot according to attention indices, and averaging the attention indices for the frames to 
determine if the sub-shot should be included [Claim 6]. However, Osberger teaches identifying 
and adaptively segmenting frames of video based upon an attention model, AKA total 
importance map, composed by linear weighting of the spatial and temporal importance maps 
(Column 2, Lines 24-41 ). It is inherent that averaging is merely linear weighting with a weight 
factor of one. Therefore, it would have been obvious to one of ordinary skill in the art, at the time 
the invention was made, to have utilized the averaging of the attention indices of Osberger to 
select frames of importance, for use in the karaoke system of Stelovsky, in light of Wang and 
Hansen, in order to adapt the attention model for a variety of different types of video sub-shots, 
while accurately determining regions of interest in the videos [Claim 6]. 

23. What Stelovsky, Wang, and Hansen further fail to teach is wherein filtering the sub-shots 
according to importance comprises instructions for analyzing the camera motion, object motion, 
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and specific objects within the subshots, and filtering the subshots according to the analysis 
[Claim 7], or wherein a visual content analyzer is configured to select from the sub-shots 
according to ranked importance, gauged by detection of color entropy, object motion, camera 
motion, or of a face within the sub-shot [Claims 10 & 32]. However, Osberger teaches selecting 
or filtering sub-shots by color information (Column 3, Lines 6-15), by camera or object motion 
(Column 7, Lines 7-37), or by specific objects, including faces, in a sub-shot (Column 8, Lines 
40-54). Therefore, it would have been obvious to one of ordinary skill in the art, at the time the 
invention was made, to have used the various color, motion, and object detection in the video 
sub-shots, as described by Osberger, in the personalized karaoke system on Stelovsky, in light 
of Wang and Hansen, in order to improve the prediction of visual importance of a sub-shot 
[Claims 7, 10, & 32]. 

24. What Stelovsky, Wang, and Hansen further fail to teach is wherein filtering the plurality 
of sub-shots comprises instructions for: examining color entropy within each of the plurality of 
sub-shots to detect motion more than a threshold indicating interest and less than a threshold 
indicating low camera and/or object movement; and selecting sub-shots having acceptable 
motion and/or color entropy scores [Claim 5], or wherein the visual content analyzer is 
configured to filter out sub-shots having low image quality, as measured by low entropy and low 
motion intensity [Claim 33]. However, Osberger teaches segmenting frames into regions based 
upon both color and luminance (Column 2, Lines 24-41). The term entropy is taken to mean 
Information Entropy or Shannon Entropy, which refers to a measure of uncertainty associated 
with a random variable. Thus, referring to lossless data compression, the color entropy would 
refer to an average minimum number of bits needed to communicate a color value. Osberger 
teaches using an algorithm to segment an image into homogeneous regions using color 
information, to generate the spatial importance map (Column 3, Lines 6-15). Osberger also 
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teaches that, if the spatial importance map is too noisy from frame to frame, a temporal 
smoothing operation is performed, and a temporal importance map is generated (Column 6, 
Line 66 to Column 7, Line 37). The temporal importance map is calculated using adaptable 
thresholds because the amount of motion varies greatly across different scenes. Osberger also 
teaches identifying sub-shots with regions of interest by using the spatial and temporal interest 
maps in order to produce an adaptive segmentation model (Column 8, Lines 58-67), for 
segmenting video scenes. Therefore, it would have been obvious to one of ordinary skill in the 
art, at the time the invention was made, to have incorporated the color entropy detection, then 
the camera motion detection of Osberger with the segmentation of karaoke video as described 
by Stelovsky, in light of Wang and Hansen, in order to attract the interest of a karaoke user 
more effectively [Claims 5 & 33]. 

25. Claims 12-15, 31, 34, 36, & 37 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Stelovsky, in view of Wang and Hansen, and further in view of Geigel et al. 
(US 2002/0122067 A1), hereinafter known as Geigel. 

26. Stelovsky, Wang, and Hansen teach all the features as demonstrated in the rejection of 
claims 1, 25, & 40 above. What Stelovsky, Wang, and Hansen fail to explicitly teach is wherein 
the instructions for segmenting visual content includes assigning photographs to be sub-shots 
[Claim 12], instructions for assigning photographs includes converting at least one photograph 
to video [Claim 14], wherein the visual content comprises home video and photographs in digital 
formats [Claim 15], wherein a visual content analyzer is configured to assemble still 
photographs, each of which is a sub-shot [Claim 31], and wherein the visual content analyzer is 
configured to define sub-shots from visual content comprising photographic and video content 
[Claim 34]. However, Geigel teaches a layout generator for digital images (Para. 0010), 
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including photographs or video clips (Para. 0055), which converts the images into a video 
(output is Picture CD media or other photo delivery media, Para. 0057). It is inherent that a 
series of images displayed during a progression of time is a video. Therefore, it would have 
been obvious to one of ordinary skill in the art, at the time the invention was made, to have 
assembled and converted photos to video, as taught by Geiger, for the background video in the 
entertainment system of Stelovsky, in light of Wang and Hansen, in order to automate the layout 
of the background in a manner pleasing to the user [Claims 12, 14, 15, 31, & 34]. 

27. What Stelovsky, Wang, and Hansen further fail to teach is wherein a visual content 
analyzer is configured with instructions for assigning photographs includes instructions for: 
rejecting photographs having problems with quality [Claim 13]; and rejecting a similar group of 
photographs when one within the group has been selected [Claims 13 & 37]. However, Geigel 
teaches performing detection of dud images and duplicate images prior to being submitted to 
the layout system (Para. 0061 ). Therefore, it would have been obvious to one of ordinary skill in 
the art, at the time the invention was made, to have not selected dud or duplicate images when 
creating the background image layout, as shown by Geigel, when implementing the 
entertainment system of Stelovsky, in light of Wang and Hansen, in order to necessitate the 
minimal input from the user when assembling images aesthetically pleasing to the user [Claims 
13 &37]. 

28. What Stelovsky, Wang, and Hansen further fail to teach is wherein a visual content 
analyzer is configured to organize photographs by the date of exposure and scene, thereby 
obtaining photographs having a relationship [Claim 36]. However, Geigel teaches organizing the 
images (page layout algorithm, Para 0059) by date of exposure (chronology of the images, 
Para. 0063) and scene (event clustering, Para. 0060). It is inherent that all the photographs 
would thus be related by a date range or event group. Therefore, it would have been obvious to 
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one of ordinary skill in the art, at the time the invention was made, to have organized the images 
to the extent provided by Geigel, is the operation of the entertainment system of Stelovsky, in 
light of Wang and Hansen, in order to distribute the photographs automatically according to an 
algorithm that valued a user-pleasing arrangement [Claim 36]. 

29. Claims 19, 39, & 44 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Stelovsky, in view of Wang and Hansen, and further in view of Bloom et al. (US 2005/0042591 
A1), hereinafter known as Bloom. 

30. Stelovsky, Wang, and Hansen teach all the features as demonstrated above in the 
rejections of claims 1 , 1 8, 25, & 40 above, including wherein the lyric formatter is configured to 
consume a file detailing timing of the lyrics (the textual track can be generated remotely and 
transmitted by communication means, digitally, using a software program, Column 14, Lines 14- 
24; the digital textual track used for the karaoke is inherently a file to be "consumed" or used). 
Stelovsky teaches wherein evaluation of output can involve differences in pronunciation patterns 
and any processes involved in generating speech (Column 14, Lines 52-59). What Stelovsky, 
Wang, and Hansen fail to teach is wherein segmenting the music comprises a lyric formatter 
configured with instructions for establishing boundaries for the music sub-clips at sentence 
breaks [Claim 1 9], or consuming a file detailing timing of each syllable and each sentence of the 
lyrics [Claims 39 & 44], and for rendering the lyrics syllable by syllable [Claim 44]. However, 
Bloom teaches automatically synchronizing sound to images, wherein lyric segmentation may 
be syllable by syllable (line can be a single word or sound) or a sentence (Para. 0139). 
Therefore, it would have been obvious to one of ordinary skill in the art, at the time the invention 
was made, to have segmented the music of the karaoke system of Stelovsky, in light of the 
syllable and sentence boundaries of the lyrics as taught by Bloom, in light of Wang and Hansen, 
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in order to synchronize the song with a user's lip movements on the accompanying video 
display [Claims 19, 39, &44]. 

31 . Claim 21 is rejected under 35 U.S.C. 103(a) as being unpatentable over Stelovsky, in 
view of Wang and Hansen, and further in view of Tsai (US 6,572,381 B1 ), hereinafter known as 
Tsai. 

32. Stelovsky, Wang, and Hansen teach all the features as demonstrated above in the 
rejections of claims 1 & 20 above. What Stelovsky, Wang, and Hansen fail to teach is wherein 
obtaining the lyrics comprises instructions for sending the file over a network to a karaoke 
device as part of a pay-for-play service [Claim 21]. However, Tsai teaches a plurality of karaoke 
terminals connected to a host computer via a network (communications line) that delivers lyric 
data (Column 8, Lines 48-61). Tsai teaches a karaoke system shares the source data as part of 
a pay service (Column 2, Lines 48-56; also Column 20, Line 52 to Column 21, Line 56). 
Therefore, it would have been obvious to one of ordinary skill in the art, at the time the invention 
was made, to have sent the lyrics file over a network in conjunction with a pay-for-play service, 
as taught by Tsai, in the karaoke system of Stelovsky, in light of Wang and Hansen, in order to 
offer commercial messages with updated custom content to a subscriber of a karaoke service 
[Claim 21]. 

33. Claim 22 is rejected under 35 U.S.C. 103(a) as being unpatentable over Stelovsky, in 
view of Wang and Hansen, and further in view of Tashiro et al. (US 5,703,308), hereinafter 
known as Tashiro. 

34. Stelovsky, Wang, and Hansen teach all the features as demonstrated above in the 
rejections of claim 1 above. What Stelovsky, Wang, and Hansen fail to teach is wherein the 
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processor-readable medium comprises instructions for: querying a database of songs by 
humming a portion of a desired song; and selecting the desired song from among a number of 
possibilities suggested by an interface to the database [Claim 22]. However, Tashiro teaches a 
karaoke device having database of songs (music data storage device with a plurality of entry 
songs stored in a data table, Column 1, Line 54 to Column 2, Line 3), wherein the database is 
queried by humming a song (key melody patterns which represent a desired song are input by 
voice, Column 3, Lines 10-14) and selecting the desired song through an interface (music 
selection is made from top 10 matching entries, Column 7, Lines 48-67). Therefore, it would 
have been obvious to one of ordinary skill in the art, at the time the invention was made, in the 
karaoke system of Stelovsky, to search and select a desired song from a database by humming, 
as taught by Tashiro, in light of Wang and Hansen, in order to select a song even if neither the 
artist nor the title of the song is known [Claim 22]. 

35. Claim 26 is rejected under 35 U.S.C. 103(a) as being unpatentable over Stelovsky, in 
view of Wang and Hansen, and further in view of Trovato et al. (US 7,058,889 B2), hereinafter 
known as Trovato. 

36. Stelovsky, Wang, and Hansen teach all the features as demonstrated above in the 
rejections of claims 1 & 25. What Stelovsky, Wang, and Hansen fail to teach wherein the music 
analyzer is configured to segment the song with a strong onset between each of the music sub- 
clips [Claim 26]. However, Trovato teaches locating transition points for a music segmentation 
scheme by onset break detection (Column 7, Lines 33-51; also Figure 6). It is inherent from 
Figure 6 that weak onset breaks are not used as transition points. Therefore, it would have been 
obvious to one of ordinary skill in the art, at the time the invention was made, to have analyzed 
the music used in the karaoke system of Stelovsky with the onset break detection method 
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defined in Trovato, in light of Wang and Hansen, in order to automatically synchronize the music 
with the background video consistent with human perception [Claim 26]. 

37. Claim 27 is rejected under 35 U.S.C. 103(a) as being unpatentable over Stelovsky, in 
view of Wang and Hansen, and further in view of Kondo (US 6,232,540 B1 ), hereinafter known 
as Kondo. 

38. Stelovsky, Wang, and Hansen teach all the features as demonstrated above in the 
rejections of claims 1 & 25. What Stelovsky, Wang, and Hansen fail to teach is wherein a music 
analyzer is configured to segment the music automatically, comprising instructions for: 
establishing boundaries for the music sub-clips with a beat position between each of the music 
sub-clips [Claim 27]. However, Kondo teaches establishing boundaries (positions) for music 
sub-clips (rhythm sound source signals) at beat positions within the music (positions of attacks 
in the rhythm sounds, Abstract). Therefore, it would have been obvious to one of ordinary skill in 
the art, at the time the invention was made, to have divided the music sub-clips at beat positions 
within the music, as shown in Kondo, for use in the karaoke system of Stelovsky, in light of 
Wang and Hansen, in order to avoid occurrences of rhythm disorder in the rhythm sounds 
[Claim 27]. 

39. Claims 30 & 42 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Stelovsky, in view of Borden, IV et al. (US 2003/0200105 A1), hereinafter known as Borden IV. 

40. Stelovsky, Wang, and Hansen teach all the features of claims 25 & 40 above. What 
Stelovsky, Wang, and Hansen fail to teach is where the video analyzer or visual content 
analyzer is configured to access folders of home video and photographs containing content from 
which the sub-shots are derived [Claims 30 & 42]. However, Border IV teaches a video analyzer 
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(user's data processing device) which can access folders of a customer's video or photographs 
(MY PHOTOS homepage document, containing a user's uploaded images or video, Para. 0016- 
0017). Therefore, it would have been obvious to one of ordinary skill in the art, at the time the 
invention was made, to have accessed a user's personal video and photo content for generating 
the sub-shots, in the karaoke device of Stelovsky, in light of Wang and Hansen, in order to 
attract potential customers to receive services by hosting their personal data [Claims 30 & 42]. 

41 . Claims 35, 38, & 43 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Stelovsky, in view of Wang and Hansen, as applied to claims 25 & 40 above, and further in view 
of Osberger and Geigel. 

42. Stelovsky, Wang, and Hansen teach all the features of claims 25 and 40 above. What 
Stelovsky, Wang, and Hansen fail to teach is wherein a visual content analyzer is configured to 
reject photographs of low quality by detecting over and under exposure, overly homogeneous 
images, and blurred images [Claim 35]. Osberger teaches a visual analyzer (image processing 
algorithm) to detect overexposure and underexposure (contrast), overly homogeneous images 
(homogeneous regions, Column 3, Lines 6-15), and blurred images (areas of very high motion, 
Column 7, Lines 10-26). What Stelovsky, Wang, Hansen, and Osberger fail to teach is wherein 
the visual content analyzer rejects photographs which are underexposed, overexposed, overly 
homogeneous, or blurred [Claim 35]. However, Geigel teaches selection of the best image 
(Para. 0057). Therefore, it would have been obvious to one of ordinary skill in the art, at the time 
the invention was made, to have rejected images which are underexposed, overexposed, overly 
homogeneous, or blurred, in light of the teachings of Osberger and Geigel, in the entertainment 
system of Stelovsky, in light of Wang and Hansen, in order to discriminate images to present 
highly desirable visuals to a karaoke user [Claim 35]. 
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43. What Stelovsky, Wang, and Hansen further fail to teach is wherein the means for 
defining and selecting visual content sub-shots is a video analyzer configured for: detecting an 
attention area within a photograph; and creating a photo to video sub-shot based on the 
attention area, wherein the video includes panning and zooming [Claims 38 & 43]. Osberger 
teaches a visual analyzer (image processing algorithm) to detect an attention area within a 
photograph (Column 2, Lines 24-41), and wherein motion vectors are used by camera motion 
estimation algorithm to determine pan and zoom in a frame (Column 7, Lines 22-37). What 
Stelovsky, Wang, Hansen, and Osberger fail to teach is wherein photo to video subshot 
includes panning and zooming. However, Geigei teaches, in photography terms rather than 
videography terms, panning the images (auto-cropping, Para. 0057) and zooming the images 
(scaling, Para. 0122). Therefore, it would have been obvious to one of ordinary skill in the art, at 
the time the invention was made, to created a photo to video sub-shot based on a detected 
attention area, including panning and zooming, in light of the teachings of Osberger and Geigei, 
in the entertainment system of Stelovsky, in light of Wang and Hansen, in order to further refine 
the content information of an image by focusing on the attention-attracting elements in the photo 
to video, when used as the background for karaoke entertainment [Claims 38 & 43]. 

44. Claim 1 1 is rejected under 35 U.S.C. 103(a) as being unpatentable over Stelovsky, 
Wang, and Hansen, as applied to claim 1 above, and further in view of Haitsma et al. (US 
2002/0178410 A1), hereinafter known as Haitsma. 

45. Stelovsky, Wang, and Hansen teach all the features of claim 1 as demonstrated above. 
What Stelovsky, Wang, and Hansen fail to teach is wherein the selecting uniformly distributed 
sub-shots comprises evaluating a normalized entropy of the sub-shots along a time line of video 
from which the sub-shots are obtained [Claim 11]. However, Haitsma teaches a hashing method 
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for indexing video clips in a database, in which a normal distribution is calculated for video clips 
to determine whether they are different quality versions of the same content (Two 3 seconds 
audio clips (or two 30-frame video sequences) are declared similar if the Hamming distance 
between the two derived hash blocks H.sub.1 and H.sub.2 is below a certain threshold T. This 
threshold T directly determines the false positive rate P.sub.f, i.e. the rate at which two audio 
clips/video sequences are incorrectly declared equal (i.e. incorrectly in the eyes of a human 
beholder): the smaller T, the smaller the probability P.sub.f will be. On the other hand, a small 
value T will negatively effect the false negative probability P.sub.n, i.e. the probability that two 
signals are "equal", but not identified as such. In order to analyze the choice of this threshold T, 
we assume that the hash extraction process yields random i.i.d. (independent and identically 
distributed) bits. The number of bit errors will then have a binomial distribution with parameters 
(n,p), where n equals the number of bits extracted and p(=0.5) is the probability that a "0" or T 
bit is extracted. Since n(32.times.256=8192 for audio, 32.times.30=960 for video) is large in our 
application, the binomial distribution can be approximated by a normal distribution with a mean 
.mu.=np and standard deviation .sigma.=[square root][square root over (np(1-p))], Para. 0041). 
This is understood to be a normalized entropy in the sense that the normal video quality is used 
to determine the similarity of sub-shots. Such a method would be used in the system and 
method of Stelovsky to determine whether a video clip or photograph duplicates the content of 
another except in quality. Therefore, it would have been obvious to one of ordinary skill in the 
art, at the time the invention was made, to have selected a uniform distribution of sub-shots 
along a timeline, as taught by Hansen, by analyzing the normalized entropy of the sub-shots, as 
taught by Haitsma, in light of the teachings of Wang and Hansen, in order to avoid the non- 
uniform selection of duplicate sub-shot content in sub-shots that have distinct data 
representations due to differing quality [Claim 11]. 
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46. Claim 9 is rejected under 35 U.S.C. 103(a) as being unpatentable over Stelovsky, Wang, 
and Hansen, as applied to claim 1 above, and further in view of Umeda (US 5,453,570 A), 
hereinafter known as Umeda. 

47. Stelovsky, Wang, and Hansen teach all the features of claim 1 as demonstrated above. 
What Stelovsky, Wang, and Hansen fail to explicitly teach is where the uniformly distributed 
sub-shots preserve a storyline represented by the visual content [Claim 9]. However, Umeda 
teaches a karaoke authoring apparatus in which the segmented video images may be a series 
of pictures, scenes, dynamic images, or still pictures presenting a story (Column 4, Lines 23- 
31 ). The sub-shots of Stelovsky, selected in a uniform distribution over a timeline of a video, as 
taught by Hansen, would preserve a chronological story as taught by Umeda. Therefore, it 
would have been obvious to one of ordinary skill in the art, at the time the invention was made, 
to preserve a storyline represented by the visual content, as taught by Umeda, in the karaoke 
system and method of Stelovsky, in light of the teachings of Wang and Hansen, in order to 
avoid placing sub-shots out of their natural chronological order, such that an order of events is 
preserved logically [Claim 9]. 

48. Claim 45 is rejected under 35 U.S.C. 103(a) as being unpatentable over Stelovsky, 
Wang, and Hansen, as applied to claim 40 above, and further in view of Haitsma and Umeda. 

49. Stelovsky, Wang, and Hansen teach all the features of claim 40 as demonstrated above. 
Stelovsky teaches means for displaying assembled visual content comprising sub-shots with 
music sub-clips (Column 3, Lines 27-41). Hansen teaches wherein the means for defining and 
selecting visual content sub-shots is such that the sub-shots are uniformly distributed within the 
visual content (Para. 0085-88). What Stelovsky, Wang, and Hansen fail to teach is where the 
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sub-shots are uniformly distributed within the visual content is further configured for selecting 
uniformly distributed sub-shots via evaluating normalized entropy of the sub-shots along a time 
line of visual content from which the sub-shots were obtained [Claim 45]. However, Haitsma 
teaches a hashing method for indexing video clips in a database, in which a normal distribution 
is calculated for video clips to determine whether they are different quality versions of the same 
content (Para. 0041). This is understood to be normalized entropy in the sense that the normal 
video quality is used to determine the similarity of sub-shots. Such a method would be used in 
the system and method of Stelovsky to determine whether a video clip or photograph duplicates 
the content of another except in quality. Therefore, it would have been obvious to one of 
ordinary skill in the art, at the time the invention was made, to have selected a uniform 
distribution of sub-shots along a timeline, as taught by Hansen, by analyzing the normalized 
entropy of the sub-shots, as taught by Haitsma, in light of the teachings of Wang and Hansen, in 
order to avoid the non-uniform selection of duplicate sub-shot content in sub-shots that have 
distinct data representations due to differing quality. What Stelovsky, Wang, and Hansen further 
fail to teach is where the means for displaying the assembled visual content comprising sub- 
shots with music sub-clips is configured such that displaying the assembled visual content 
preserves a storyline as represented by the visual content [Claim 45]. However, Umeda teaches 
a karaoke authoring apparatus in which the segmented video images may be a series of 
pictures, scenes, dynamic images, or still pictures presenting a story (Column 4, Lines 23-31). 
The sub-shots of Stelovsky, selected in a uniform distribution over a timeline of a video, as 
taught by Hansen, would preserve a chronological story as taught by Umeda. Therefore, it 
would have been obvious to one of ordinary skill in the art, at the time the invention was made, 
to preserve a storyline represented by the visual content, as taught by Umeda, in the karaoke 
system and method of Stelovsky, in light of the teachings of Wang, Hansen, and Haitsma, in 
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order to avoid placing sub-shots out of their natural chronological order, such that an order of 
events is preserved logically [Claim 45]. 



Response to Arguments 

50. Applicant's request for a formal interview if reply is anything other than allowance of the 
pending claims, see page 19 of remarks filed 5/21/2008, is denied at this time. The Examiner's 
position at present is that prosecution of the case would not be advanced without a formal 
response to new grounds of rejection presented herein. 

51 . Applicant's request for withdrawal of the finality of the last office action, see pages 1 9- 
20, is moot in view of the concurrently filed request for continued examination. To reiterate from 
the final rejection of 1 1/21/2007, Applicant's amendments necessitated the new grounds of 
rejection presented. Under present practice, second or any subsequent actions on the merits 
shall be final, except where the examiner introduces a new ground of rejection that is neither 
necessitated by applicant's amendment of the claims, nor based on information submitted in an 
information disclosure statement filed during the period set forth in 37 CFR 1 .97(c) with the fee 
set forth in 37 CFR 1 .1 7(p). Once a final rejection that is not premature has been entered in an 
application/reexamination proceeding, it should not be withdrawn at the applicant's or patent 
owner's request except on a showing under 37 CFR 1.116(b). See MPEP 706.07. 

52. Applicant's arguments with respect to rejection of claims under 35 USC §1 03, see pages 
27-30, have been considered but are moot in view of the new ground(s) of rejection. 
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Conclusion 

Any inquiry concerning this communication or earlier communications from the examiner 
should be directed to Nikolai A. Gishnock whose telephone number is (571 )272-1420. The 
examiner can normally be reached on M-F 8:30a-5p. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Xuan M. Thai can be reached on 571-272-7147. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private 
PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you 
would like assistance from a USPTO Customer Service Representative or access to the 
automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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